Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells420431
Missing cells (%)7.8%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 81 (18.2%) missing values Age has 81 (18.2%) missing values Missing
Cabin has 339 (76.0%) missing values Cabin has 349 (78.3%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 307 (68.8%) zeros SibSp has 307 (68.8%) zeros Zeros
Parch has 335 (75.1%) zeros Parch has 346 (77.6%) zeros Zeros
Fare has 5 (1.1%) zeros Fare has 7 (1.6%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-07-19 17:52:21.1331292023-07-19 17:52:25.277918
Analysis finished2023-07-19 17:52:25.2766632023-07-19 17:52:29.307974
Duration4.14 seconds4.03 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean439.12332432.13453
 Dataset ADataset B
Minimum13
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:29.482149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum13
5-th percentile41.2545.25
Q1220.5200
median432.5429.5
Q3655.75663.5
95-th percentile847.25848
Maximum891891
Range890888
Interquartile range (IQR)435.25463.5

Descriptive statistics

 Dataset ADataset B
Standard deviation255.48674261.12986
Coefficient of variation (CV)0.581810920.60427909
Kurtosis-1.1830659-1.2224571
Mean439.12332432.13453
Median Absolute Deviation (MAD)216.5232
Skewness0.0717248990.084067407
Sum195849192732
Variance65273.47568188.804
MonotonicityNot monotonicNot monotonic
2023-07-19T17:52:29.733508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
780 1
 
0.2%
82 1
 
0.2%
308 1
 
0.2%
219 1
 
0.2%
49 1
 
0.2%
91 1
 
0.2%
383 1
 
0.2%
112 1
 
0.2%
766 1
 
0.2%
249 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
614 1
 
0.2%
158 1
 
0.2%
168 1
 
0.2%
608 1
 
0.2%
556 1
 
0.2%
136 1
 
0.2%
832 1
 
0.2%
148 1
 
0.2%
10 1
 
0.2%
881 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
8 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
8 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
269 
1
177 
0
262 
1
184 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row10
3rd row00
4th row11
5th row01

Common Values

ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Length

2023-07-19T17:52:29.927530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-07-19T17:52:30.065990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:30.195215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Most occurring characters

ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 262
58.7%
1 184
41.3%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
246 
1
118 
2
82 
3
244 
1
104 
2
98 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row13
2nd row31
3rd row31
4th row13
5th row23

Common Values

ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Length

2023-07-19T17:52:30.336179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-07-19T17:52:30.481087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:30.620459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring characters

ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 246
55.2%
1 118
26.5%
2 82
 
18.4%
ValueCountFrequency (%)
3 244
54.7%
1 104
23.3%
2 98
22.0%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:31.164825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length4949.5
Mean length26.91255627.006726
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1200312045
Distinct characters5959
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowRobert, Mrs. Edward Scott (Elisabeth Walton McMillan)Horgan, Mr. John
2nd rowCoutts, Master. William Loch "William"Weir, Col. John
3rd rowVande Velde, Mr. Johannes JosephLewy, Mr. Ervin G
4th rowBaxter, Mrs. James (Helene DeLaudeniere Chaput)Madigan, Miss. Margaret "Maggie"
5th rowSlemen, Mr. Richard JamesOsman, Mrs. Mara
ValueCountFrequency (%)
mr 251
 
13.8%
miss 96
 
5.3%
mrs 60
 
3.3%
william 31
 
1.7%
master 26
 
1.4%
john 24
 
1.3%
henry 17
 
0.9%
charles 14
 
0.8%
james 12
 
0.7%
thomas 12
 
0.7%
Other values (886) 1270
70.0%
ValueCountFrequency (%)
mr 248
 
13.7%
miss 99
 
5.5%
mrs 66
 
3.6%
william 35
 
1.9%
john 28
 
1.5%
master 23
 
1.3%
henry 18
 
1.0%
charles 13
 
0.7%
james 12
 
0.7%
mary 11
 
0.6%
Other values (896) 1257
69.4%
2023-07-19T17:52:32.041135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1369
 
11.4%
r 959
 
8.0%
a 853
 
7.1%
e 815
 
6.8%
i 682
 
5.7%
s 644
 
5.4%
n 611
 
5.1%
M 559
 
4.7%
l 532
 
4.4%
o 515
 
4.3%
Other values (49) 4464
37.2%
ValueCountFrequency (%)
1366
 
11.3%
r 944
 
7.8%
e 871
 
7.2%
a 850
 
7.1%
i 699
 
5.8%
s 677
 
5.6%
n 652
 
5.4%
M 564
 
4.7%
l 543
 
4.5%
o 493
 
4.1%
Other values (49) 4386
36.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7724
64.4%
Uppercase Letter 1821
 
15.2%
Space Separator 1369
 
11.4%
Other Punctuation 949
 
7.9%
Close Punctuation 67
 
0.6%
Open Punctuation 67
 
0.6%
Dash Punctuation 6
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7747
64.3%
Uppercase Letter 1817
 
15.1%
Space Separator 1366
 
11.3%
Other Punctuation 958
 
8.0%
Open Punctuation 76
 
0.6%
Close Punctuation 76
 
0.6%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1369
100.0%
ValueCountFrequency (%)
1366
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 959
12.4%
a 853
11.0%
e 815
10.6%
i 682
8.8%
s 644
8.3%
n 611
7.9%
l 532
 
6.9%
o 515
 
6.7%
t 360
 
4.7%
h 271
 
3.5%
Other values (16) 1482
19.2%
ValueCountFrequency (%)
r 944
12.2%
e 871
11.2%
a 850
11.0%
i 699
9.0%
s 677
8.7%
n 652
8.4%
l 543
 
7.0%
o 493
 
6.4%
t 334
 
4.3%
h 260
 
3.4%
Other values (16) 1424
18.4%
Uppercase Letter
ValueCountFrequency (%)
M 559
30.7%
A 136
 
7.5%
H 102
 
5.6%
J 100
 
5.5%
E 94
 
5.2%
C 89
 
4.9%
S 85
 
4.7%
W 75
 
4.1%
B 73
 
4.0%
L 60
 
3.3%
Other values (15) 448
24.6%
ValueCountFrequency (%)
M 564
31.0%
A 120
 
6.6%
J 115
 
6.3%
H 101
 
5.6%
S 88
 
4.8%
B 83
 
4.6%
C 81
 
4.5%
W 72
 
4.0%
L 71
 
3.9%
E 70
 
3.9%
Other values (15) 452
24.9%
Other Punctuation
ValueCountFrequency (%)
. 447
47.1%
, 446
47.0%
" 54
 
5.7%
' 2
 
0.2%
ValueCountFrequency (%)
, 446
46.6%
. 446
46.6%
" 60
 
6.3%
' 6
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
ValueCountFrequency (%)
) 76
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
ValueCountFrequency (%)
( 76
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9545
79.5%
Common 2458
 
20.5%
ValueCountFrequency (%)
Latin 9564
79.4%
Common 2481
 
20.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1369
55.7%
. 447
 
18.2%
, 446
 
18.1%
) 67
 
2.7%
( 67
 
2.7%
" 54
 
2.2%
- 6
 
0.2%
' 2
 
0.1%
ValueCountFrequency (%)
1366
55.1%
, 446
 
18.0%
. 446
 
18.0%
( 76
 
3.1%
) 76
 
3.1%
" 60
 
2.4%
' 6
 
0.2%
- 5
 
0.2%
Latin
ValueCountFrequency (%)
r 959
 
10.0%
a 853
 
8.9%
e 815
 
8.5%
i 682
 
7.1%
s 644
 
6.7%
n 611
 
6.4%
M 559
 
5.9%
l 532
 
5.6%
o 515
 
5.4%
t 360
 
3.8%
Other values (41) 3015
31.6%
ValueCountFrequency (%)
r 944
 
9.9%
e 871
 
9.1%
a 850
 
8.9%
i 699
 
7.3%
s 677
 
7.1%
n 652
 
6.8%
M 564
 
5.9%
l 543
 
5.7%
o 493
 
5.2%
t 334
 
3.5%
Other values (41) 2937
30.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12003
100.0%
ValueCountFrequency (%)
ASCII 12045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1369
 
11.4%
r 959
 
8.0%
a 853
 
7.1%
e 815
 
6.8%
i 682
 
5.7%
s 644
 
5.4%
n 611
 
5.1%
M 559
 
4.7%
l 532
 
4.4%
o 515
 
4.3%
Other values (49) 4464
37.2%
ValueCountFrequency (%)
1366
 
11.3%
r 944
 
7.8%
e 871
 
7.2%
a 850
 
7.1%
i 699
 
5.8%
s 677
 
5.6%
n 652
 
5.4%
M 564
 
4.7%
l 543
 
4.5%
o 493
 
4.1%
Other values (49) 4386
36.4%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
289 
female
157 
male
279 
female
167 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70403594.7488789
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20982118
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowmalemale
3rd rowmalemale
4th rowfemalefemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 279
62.6%
female 167
37.4%

Length

2023-07-19T17:52:32.270178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-07-19T17:52:32.482420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:32.636182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 279
62.6%
female 167
37.4%

Most occurring characters

ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 613
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 167
 
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2098
100.0%
ValueCountFrequency (%)
Lowercase Letter 2118
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 613
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 167
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 2098
100.0%
ValueCountFrequency (%)
Latin 2118
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 613
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 167
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2098
100.0%
ValueCountFrequency (%)
ASCII 2118
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 613
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 167
 
7.9%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7673
Distinct (%)20.8%20.0%
Missing8181
Missing (%)18.2%18.2%
Infinite00
Infinite (%)0.0%0.0%
Mean29.67832929.222137
 Dataset ADataset B
Minimum0.420.83
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:32.953703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.83
5-th percentile44.2
Q12021
median2828
Q33936.5
95-th percentile57.852
Maximum8080
Range79.5879.17
Interquartile range (IQR)1915.5

Descriptive statistics

 Dataset ADataset B
Standard deviation14.90921913.626778
Coefficient of variation (CV)0.502360450.46631695
Kurtosis-0.0414392020.69039792
Mean29.67832929.222137
Median Absolute Deviation (MAD)98
Skewness0.295299380.40503663
Sum10832.5910666.08
Variance222.2848185.68908
MonotonicityNot monotonicNot monotonic
2023-07-19T17:52:33.220850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 16
 
3.6%
24 15
 
3.4%
18 13
 
2.9%
30 13
 
2.9%
25 13
 
2.9%
33 11
 
2.5%
21 11
 
2.5%
35 11
 
2.5%
27 11
 
2.5%
28 11
 
2.5%
Other values (66) 240
53.8%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
22 16
 
3.6%
19 16
 
3.6%
28 14
 
3.1%
24 14
 
3.1%
21 14
 
3.1%
27 13
 
2.9%
30 13
 
2.9%
29 12
 
2.7%
25 12
 
2.7%
32 12
 
2.7%
Other values (63) 229
51.3%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 5
1.1%
2 5
1.1%
3 3
0.7%
4 7
1.6%
5 3
0.7%
7 2
 
0.4%
8 4
0.9%
ValueCountFrequency (%)
0.83 2
 
0.4%
0.92 1
 
0.2%
1 6
1.3%
2 3
0.7%
3 4
0.9%
4 3
0.7%
5 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.83 2
 
0.4%
0.92 1
 
0.2%
1 6
1.3%
2 3
0.7%
3 4
0.9%
4 3
0.7%
5 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 5
1.1%
2 5
1.1%
3 3
0.7%
4 7
1.6%
5 3
0.7%
7 2
 
0.4%
8 4
0.9%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.477578480.46636771
 Dataset ADataset B
Minimum00
Maximum58
Zeros307307
Zeros (%)68.8%68.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:33.418559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum58
Range58
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.925634660.93975315
Coefficient of variation (CV)1.93818342.0150476
Kurtosis8.133624515.366636
Mean0.477578480.46636771
Median Absolute Deviation (MAD)00
Skewness2.70436853.3533975
Sum213208
Variance0.856799520.88313599
MonotonicityNot monotonicNot monotonic
2023-07-19T17:52:33.574424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 307
68.8%
1 104
 
23.3%
2 14
 
3.1%
4 10
 
2.2%
3 7
 
1.6%
5 4
 
0.9%
ValueCountFrequency (%)
0 307
68.8%
1 109
 
24.4%
2 10
 
2.2%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 1
 
0.2%
ValueCountFrequency (%)
0 307
68.8%
1 104
 
23.3%
2 14
 
3.1%
3 7
 
1.6%
4 10
 
2.2%
5 4
 
0.9%
ValueCountFrequency (%)
0 307
68.8%
1 109
 
24.4%
2 10
 
2.2%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 1
 
0.2%
ValueCountFrequency (%)
0 307
68.8%
1 109
 
24.4%
2 10
 
2.2%
3 9
 
2.0%
4 6
 
1.3%
5 4
 
0.9%
8 1
 
0.2%
ValueCountFrequency (%)
0 307
68.8%
1 104
 
23.3%
2 14
 
3.1%
3 7
 
1.6%
4 10
 
2.2%
5 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.394618830.34304933
 Dataset ADataset B
Minimum00
Maximum56
Zeros335346
Zeros (%)75.1%77.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:33.723695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.794004340.72903423
Coefficient of variation (CV)2.01207922.1251586
Kurtosis6.85777410.095701
Mean0.394618830.34304933
Median Absolute Deviation (MAD)00
Skewness2.3921042.6541143
Sum176153
Variance0.630442890.53149091
MonotonicityNot monotonicNot monotonic
2023-07-19T17:52:33.872269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 335
75.1%
1 60
 
13.5%
2 43
 
9.6%
3 4
 
0.9%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 335
75.1%
1 60
 
13.5%
2 43
 
9.6%
3 4
 
0.9%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 346
77.6%
1 55
 
12.3%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 335
75.1%
1 60
 
13.5%
2 43
 
9.6%
3 4
 
0.9%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct380386
Distinct (%)85.2%86.5%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:34.522036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.89461886.8766816
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30753067
Distinct characters3531
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique329338 ?
Unique (%)73.8%75.8%

Sample

 Dataset ADataset B
1st row24160370377
2nd rowC.A. 37671113800
3rd row345780PC 17612
4th rowPC 17558370370
5th row28206349244
ValueCountFrequency (%)
pc 39
 
6.7%
c.a 10
 
1.7%
a/5 8
 
1.4%
ston/o 8
 
1.4%
2 8
 
1.4%
ston/o2 6
 
1.0%
soton/o.q 5
 
0.9%
ca 5
 
0.9%
w./c 5
 
0.9%
a/4 5
 
0.9%
Other values (402) 480
82.9%
ValueCountFrequency (%)
pc 25
 
4.4%
c.a 11
 
1.9%
ston/o 10
 
1.8%
2 10
 
1.8%
a/5 8
 
1.4%
ca 7
 
1.2%
w./c 7
 
1.2%
sc/paris 6
 
1.1%
ston/o2 6
 
1.1%
2144 5
 
0.9%
Other values (407) 476
83.4%
2023-07-19T17:52:35.419757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 373
12.1%
1 368
12.0%
2 306
10.0%
7 247
 
8.0%
4 225
 
7.3%
6 222
 
7.2%
5 196
 
6.4%
0 194
 
6.3%
9 156
 
5.1%
8 137
 
4.5%
Other values (25) 651
21.2%
ValueCountFrequency (%)
3 364
11.9%
1 356
11.6%
2 313
10.2%
7 245
 
8.0%
4 233
 
7.6%
0 221
 
7.2%
6 196
 
6.4%
5 186
 
6.1%
9 157
 
5.1%
8 142
 
4.6%
Other values (21) 654
21.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2424
78.8%
Uppercase Letter 356
 
11.6%
Other Punctuation 149
 
4.8%
Space Separator 133
 
4.3%
Lowercase Letter 13
 
0.4%
ValueCountFrequency (%)
Decimal Number 2413
78.7%
Uppercase Letter 351
 
11.4%
Other Punctuation 165
 
5.4%
Space Separator 125
 
4.1%
Lowercase Letter 13
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 373
15.4%
1 368
15.2%
2 306
12.6%
7 247
10.2%
4 225
9.3%
6 222
9.2%
5 196
8.1%
0 194
8.0%
9 156
6.4%
8 137
 
5.7%
ValueCountFrequency (%)
3 364
15.1%
1 356
14.8%
2 313
13.0%
7 245
10.2%
4 233
9.7%
0 221
9.2%
6 196
8.1%
5 186
7.7%
9 157
6.5%
8 142
 
5.9%
Space Separator
ValueCountFrequency (%)
133
100.0%
ValueCountFrequency (%)
125
100.0%
Other Punctuation
ValueCountFrequency (%)
. 92
61.7%
/ 57
38.3%
ValueCountFrequency (%)
. 108
65.5%
/ 57
34.5%
Uppercase Letter
ValueCountFrequency (%)
C 80
22.5%
O 59
16.6%
P 54
15.2%
S 42
11.8%
A 38
10.7%
N 24
 
6.7%
T 23
 
6.5%
W 9
 
2.5%
Q 9
 
2.5%
F 4
 
1.1%
Other values (6) 14
 
3.9%
ValueCountFrequency (%)
C 72
20.5%
O 62
17.7%
P 49
14.0%
S 45
12.8%
A 35
10.0%
N 26
 
7.4%
T 24
 
6.8%
W 11
 
3.1%
Q 7
 
2.0%
I 6
 
1.7%
Other values (4) 14
 
4.0%
Lowercase Letter
ValueCountFrequency (%)
a 4
30.8%
s 3
23.1%
r 2
15.4%
i 2
15.4%
l 1
 
7.7%
e 1
 
7.7%
ValueCountFrequency (%)
a 4
30.8%
i 3
23.1%
s 3
23.1%
r 3
23.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2706
88.0%
Latin 369
 
12.0%
ValueCountFrequency (%)
Common 2703
88.1%
Latin 364
 
11.9%

Most frequent character per script

Common
ValueCountFrequency (%)
3 373
13.8%
1 368
13.6%
2 306
11.3%
7 247
9.1%
4 225
8.3%
6 222
8.2%
5 196
7.2%
0 194
7.2%
9 156
5.8%
8 137
 
5.1%
Other values (3) 282
10.4%
ValueCountFrequency (%)
3 364
13.5%
1 356
13.2%
2 313
11.6%
7 245
9.1%
4 233
8.6%
0 221
8.2%
6 196
7.3%
5 186
6.9%
9 157
5.8%
8 142
 
5.3%
Other values (3) 290
10.7%
Latin
ValueCountFrequency (%)
C 80
21.7%
O 59
16.0%
P 54
14.6%
S 42
11.4%
A 38
10.3%
N 24
 
6.5%
T 23
 
6.2%
W 9
 
2.4%
Q 9
 
2.4%
a 4
 
1.1%
Other values (12) 27
 
7.3%
ValueCountFrequency (%)
C 72
19.8%
O 62
17.0%
P 49
13.5%
S 45
12.4%
A 35
9.6%
N 26
 
7.1%
T 24
 
6.6%
W 11
 
3.0%
Q 7
 
1.9%
I 6
 
1.6%
Other values (8) 27
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3075
100.0%
ValueCountFrequency (%)
ASCII 3067
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 373
12.1%
1 368
12.0%
2 306
10.0%
7 247
 
8.0%
4 225
 
7.3%
6 222
 
7.2%
5 196
 
6.4%
0 194
 
6.3%
9 156
 
5.1%
8 137
 
4.5%
Other values (25) 651
21.2%
ValueCountFrequency (%)
3 364
11.9%
1 356
11.6%
2 313
10.2%
7 245
 
8.0%
4 233
 
7.6%
0 221
 
7.2%
6 196
 
6.4%
5 186
 
6.1%
9 157
 
5.1%
8 142
 
4.6%
Other values (21) 654
21.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct183183
Distinct (%)41.0%41.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean34.21432131.734818
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros57
Zeros (%)1.1%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:35.686396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.89587.8958
median14.514.25415
Q332.22187531
95-th percentile120108.28125
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)24.32607523.1042

Descriptive statistics

 Dataset ADataset B
Standard deviation53.57996448.442495
Coefficient of variation (CV)1.56600991.5264778
Kurtosis31.26223128.365234
Mean34.21432131.734818
Median Absolute Deviation (MAD)7.26046.52085
Skewness4.68453014.4287539
Sum15259.58714153.729
Variance2870.81262346.6754
MonotonicityNot monotonicNot monotonic
2023-07-19T17:52:35.943847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 23
 
5.2%
13 20
 
4.5%
8.05 18
 
4.0%
7.75 15
 
3.4%
7.25 10
 
2.2%
26 10
 
2.2%
7.925 9
 
2.0%
7.2292 9
 
2.0%
10.5 8
 
1.8%
26.55 8
 
1.8%
Other values (173) 316
70.9%
ValueCountFrequency (%)
7.75 23
 
5.2%
13 23
 
5.2%
26 19
 
4.3%
8.05 18
 
4.0%
7.8958 17
 
3.8%
10.5 13
 
2.9%
7.925 12
 
2.7%
7.8542 8
 
1.8%
8.6625 8
 
1.8%
7.25 8
 
1.8%
Other values (173) 297
66.6%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.05 6
1.3%
7.125 3
0.7%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 4
0.9%
7.225 4
0.9%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 4
0.9%
7.225 4
0.9%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.05 6
1.3%
7.125 3
0.7%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9081
Distinct (%)84.1%83.5%
Missing339349
Missing (%)76.0%78.3%
Memory size7.0 KiB7.0 KiB
2023-07-19T17:52:36.489347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.7289723.6597938
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters399355
Distinct characters1819
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7366 ?
Unique (%)68.2%68.0%

Sample

 Dataset ADataset B
1st rowB3B42
2nd rowB58 B60B5
3rd rowE17B51 B53 B55
4th rowA26B39
5th rowE68D56
ValueCountFrequency (%)
f 3
 
2.3%
f2 2
 
1.6%
b53 2
 
1.6%
f4 2
 
1.6%
b98 2
 
1.6%
b96 2
 
1.6%
d33 2
 
1.6%
b58 2
 
1.6%
c68 2
 
1.6%
c27 2
 
1.6%
Other values (93) 108
83.7%
ValueCountFrequency (%)
c23 3
 
2.6%
c27 3
 
2.6%
c25 3
 
2.6%
b49 2
 
1.8%
e33 2
 
1.8%
e44 2
 
1.8%
f2 2
 
1.8%
b77 2
 
1.8%
e101 2
 
1.8%
c83 2
 
1.8%
Other values (83) 91
79.8%
2023-07-19T17:52:37.227184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 39
 
9.8%
2 37
 
9.3%
3 36
 
9.0%
B 34
 
8.5%
1 30
 
7.5%
6 24
 
6.0%
8 23
 
5.8%
4 22
 
5.5%
5 22
 
5.5%
22
 
5.5%
Other values (8) 110
27.6%
ValueCountFrequency (%)
C 38
10.7%
2 37
10.4%
1 32
 
9.0%
3 31
 
8.7%
B 28
 
7.9%
6 24
 
6.8%
5 20
 
5.6%
7 20
 
5.6%
4 20
 
5.6%
E 19
 
5.4%
Other values (9) 86
24.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 248
62.2%
Uppercase Letter 129
32.3%
Space Separator 22
 
5.5%
ValueCountFrequency (%)
Decimal Number 224
63.1%
Uppercase Letter 114
32.1%
Space Separator 17
 
4.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 39
30.2%
B 34
26.4%
D 20
15.5%
E 13
 
10.1%
A 10
 
7.8%
F 9
 
7.0%
G 4
 
3.1%
ValueCountFrequency (%)
C 38
33.3%
B 28
24.6%
E 19
16.7%
D 11
 
9.6%
A 9
 
7.9%
F 6
 
5.3%
G 2
 
1.8%
T 1
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 37
14.9%
3 36
14.5%
1 30
12.1%
6 24
9.7%
8 23
9.3%
4 22
8.9%
5 22
8.9%
9 21
8.5%
7 17
6.9%
0 16
6.5%
ValueCountFrequency (%)
2 37
16.5%
1 32
14.3%
3 31
13.8%
6 24
10.7%
5 20
8.9%
7 20
8.9%
4 20
8.9%
9 15
6.7%
0 14
 
6.2%
8 11
 
4.9%
Space Separator
ValueCountFrequency (%)
22
100.0%
ValueCountFrequency (%)
17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 270
67.7%
Latin 129
32.3%
ValueCountFrequency (%)
Common 241
67.9%
Latin 114
32.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 39
30.2%
B 34
26.4%
D 20
15.5%
E 13
 
10.1%
A 10
 
7.8%
F 9
 
7.0%
G 4
 
3.1%
ValueCountFrequency (%)
C 38
33.3%
B 28
24.6%
E 19
16.7%
D 11
 
9.6%
A 9
 
7.9%
F 6
 
5.3%
G 2
 
1.8%
T 1
 
0.9%
Common
ValueCountFrequency (%)
2 37
13.7%
3 36
13.3%
1 30
11.1%
6 24
8.9%
8 23
8.5%
4 22
8.1%
5 22
8.1%
22
8.1%
9 21
7.8%
7 17
6.3%
ValueCountFrequency (%)
2 37
15.4%
1 32
13.3%
3 31
12.9%
6 24
10.0%
5 20
8.3%
7 20
8.3%
4 20
8.3%
17
7.1%
9 15
6.2%
0 14
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 399
100.0%
ValueCountFrequency (%)
ASCII 355
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 39
 
9.8%
2 37
 
9.3%
3 36
 
9.0%
B 34
 
8.5%
1 30
 
7.5%
6 24
 
6.0%
8 23
 
5.8%
4 22
 
5.5%
5 22
 
5.5%
22
 
5.5%
Other values (8) 110
27.6%
ValueCountFrequency (%)
C 38
10.7%
2 37
10.4%
1 32
 
9.0%
3 31
 
8.7%
B 28
 
7.9%
6 24
 
6.8%
5 20
 
5.6%
7 20
 
5.6%
4 20
 
5.6%
E 19
 
5.4%
Other values (9) 86
24.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing01
Missing (%)0.0%0.2%
Memory size7.0 KiB7.0 KiB
S
310 
C
100 
Q
36 
S
326 
C
79 
Q
40 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSQ
2nd rowSS
3rd rowSC
4th rowCQ
5th rowSS

Common Values

ValueCountFrequency (%)
S 310
69.5%
C 100
 
22.4%
Q 36
 
8.1%
ValueCountFrequency (%)
S 326
73.1%
C 79
 
17.7%
Q 40
 
9.0%
(Missing) 1
 
0.2%

Length

2023-07-19T17:52:37.432388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-07-19T17:52:37.575773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:37.714008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s 310
69.5%
c 100
 
22.4%
q 36
 
8.1%
ValueCountFrequency (%)
s 326
73.3%
c 79
 
17.8%
q 40
 
9.0%

Most occurring characters

ValueCountFrequency (%)
S 310
69.5%
C 100
 
22.4%
Q 36
 
8.1%
ValueCountFrequency (%)
S 326
73.3%
C 79
 
17.8%
Q 40
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 446
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 310
69.5%
C 100
 
22.4%
Q 36
 
8.1%
ValueCountFrequency (%)
S 326
73.3%
C 79
 
17.8%
Q 40
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 446
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 310
69.5%
C 100
 
22.4%
Q 36
 
8.1%
ValueCountFrequency (%)
S 326
73.3%
C 79
 
17.8%
Q 40
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 310
69.5%
C 100
 
22.4%
Q 36
 
8.1%
ValueCountFrequency (%)
S 326
73.3%
C 79
 
17.8%
Q 40
 
9.0%

Interactions

Dataset A

2023-07-19T17:52:24.174914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:28.069843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:21.721020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:25.607406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.303599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.207106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.916616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.818774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.501127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.452493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:24.284390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:28.183656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:21.833063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:25.717542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.420615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.322020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.029662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.936866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.706781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.565820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:24.415515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:28.313599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:21.959635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:25.845052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.554101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.453695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.156087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.063367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.832696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.698752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:24.529459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:28.557877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.074067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:25.973824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.674546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.573024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.271959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.199877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.947408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.828911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:24.645159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:28.677414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.189852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.091980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:22.797200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:26.698612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:23.385497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.328197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-07-19T17:52:24.059965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:27.950986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

Dataset A

2023-07-19T17:52:37.825726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-07-19T17:52:37.997974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.033-0.0250.0390.0110.0970.0660.1050.000
Age0.0331.000-0.156-0.2510.1850.1550.3100.1170.092
SibSp-0.025-0.1561.0000.4370.4630.2170.1540.2030.154
Parch0.039-0.2510.4371.0000.4220.1330.0000.2150.089
Fare0.0110.1850.4630.4221.0000.3350.4930.1990.186
Survived0.0970.1550.2170.1330.3351.0000.3680.5380.191
Pclass0.0660.3100.1540.0000.4930.3681.0000.1440.240
Sex0.1050.1170.2030.2150.1990.5380.1441.0000.083
Embarked0.0000.0920.1540.0890.1860.1910.2400.0831.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.071-0.079-0.017-0.0380.1390.0290.0000.000
Age0.0711.000-0.223-0.2800.0710.1730.2170.0880.049
SibSp-0.079-0.2231.0000.4340.4640.1560.1400.1500.084
Parch-0.017-0.2800.4341.0000.4070.1180.0000.1460.023
Fare-0.0380.0710.4640.4071.0000.2490.4990.1700.166
Survived0.1390.1730.1560.1180.2491.0000.3220.5310.168
Pclass0.0290.2170.1400.0000.4990.3221.0000.1180.276
Sex0.0000.0880.1500.1460.1700.5310.1181.0000.168
Embarked0.0000.0490.0840.0230.1660.1680.2760.1681.000

Missing values

Dataset A

2023-07-19T17:52:24.816306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-07-19T17:52:28.852553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-07-19T17:52:25.055600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-07-19T17:52:29.084040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-07-19T17:52:25.214741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-07-19T17:52:29.237897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
77978011Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)female43.00124160211.3375B3S
34834913Coutts, Master. William Loch "William"male3.011C.A. 3767115.9000NaNS
75275303Vande Velde, Mr. Johannes Josephmale33.0003457809.5000NaNS
29930011Baxter, Mrs. James (Helene DeLaudeniere Chaput)female50.001PC 17558247.5208B58 B60C
81281302Slemen, Mr. Richard Jamesmale35.0002820610.5000NaNS
85785811Daly, Mr. Peter Denismale51.00011305526.5500E17S
12612703McMahon, Mr. MartinmaleNaN003703727.7500NaNQ
64764811Simonius-Blumer, Col. Oberst Alfonsmale56.0001321335.5000A26C
28929013Connolly, Miss. Katefemale22.0003703737.7500NaNQ
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
61361403Horgan, Mr. JohnmaleNaN003703777.7500NaNQ
69469501Weir, Col. Johnmale60.00011380026.5500NaNS
29529601Lewy, Mr. Ervin GmaleNaN00PC 1761227.7208NaNC
19819913Madigan, Miss. Margaret "Maggie"femaleNaN003703707.7500NaNQ
79779813Osman, Mrs. Marafemale31.0003492448.6833NaNS
11411503Attalah, Miss. Malakefemale17.000262714.4583NaNC
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
697003Kink, Mr. Vincenzmale26.0203151518.6625NaNS
73073111Allen, Miss. Elisabeth Waltonfemale29.00024160211.3375B5S
38138213Nakid, Miss. Maria ("Mary")female1.002265315.7417NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
76076103Garfirth, Mr. JohnmaleNaN0035858514.5000NaNS
62662702Kirkland, Rev. Charles Leonardmale57.00021953312.3500NaNQ
54955012Davies, Master. John Morgan Jrmale8.011C.A. 3311236.7500NaNS
34134211Fortune, Miss. Alice Elizabethfemale24.03219950263.0000C23 C25 C27S
48148202Frost, Mr. Anthony Wood "Archie"maleNaN002398540.0000NaNS
14814902Navratil, Mr. Michel ("Louis M Hoffman")male36.50223008026.0000F2S
43543611Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S
82682703Lam, Mr. LenmaleNaN00160156.4958NaNS
28728803Naidenoff, Mr. Penkomale22.0003492067.8958NaNS
47047103Keefe, Mr. ArthurmaleNaN003235927.2500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
57357413Kelly, Miss. MaryfemaleNaN00143127.7500NaNQ
78878913Dean, Master. Bertram Veremale1.012C.A. 231520.5750NaNS
39139213Jansson, Mr. Carl Olofmale21.0003500347.7958NaNS
56356403Simmons, Mr. JohnmaleNaN00SOTON/OQ 3920828.0500NaNS
36937011Aubart, Mme. Leontine Paulinefemale24.000PC 1747769.3000B35C
444513Devaney, Miss. Margaret Deliafemale19.0003309587.8792NaNQ
14214313Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)female24.010STON/O2. 310127915.8500NaNS
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
34134211Fortune, Miss. Alice Elizabethfemale24.03219950263.0000C23 C25 C27S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.